Skip to content

Comments

Feat: vibevoice asr#286

Merged
AnkushMalaker merged 8 commits intodevfrom
feat/vibevoice-asr
Feb 7, 2026
Merged

Feat: vibevoice asr#286
AnkushMalaker merged 8 commits intodevfrom
feat/vibevoice-asr

Conversation

@AnkushMalaker
Copy link
Collaborator

@AnkushMalaker AnkushMalaker commented Feb 6, 2026

Summary by CodeRabbit

  • New Features

    • Added Knowledge Graph service for managing entities, relationships, and promises extracted from conversations
    • Added multi-provider ASR support (VibeVoice, Faster-Whisper, NeMo, Transformers) with configurable capabilities
    • Added provider-based speaker diarization for enhanced speaker identification
  • Improvements

    • Enhanced memory context integration for more informed conversation summaries

- Removed cumulative audio offset tracking from StreamingTranscriptionConsumer as Deepgram provides cumulative timestamps directly.
- Updated store_final_result method to utilize Deepgram's cumulative timestamps without adjustments.
- Implemented completion signaling for transcription sessions in Redis, ensuring conversation jobs wait for all results before processing.
- Improved error handling to signal completion even in case of errors, preventing conversation jobs from hanging.
- Enhanced logging for better visibility of transcription completion and error states.
- Updated `config.yml.template` to include capabilities for ASR providers, detailing features like word timestamps and speaker segments.
- Added a new `vibevoice` provider configuration for Microsoft VibeVoice ASR, supporting speaker diarization.
- Enhanced `.env.template` with clearer provider selection and model configuration options, including CUDA settings and voice activity detection.
- Improved `docker-compose.yml` to support multiple ASR providers with detailed service configurations.
- Introduced common utilities for audio processing and ASR service management in the `common` module, enhancing code reusability and maintainability.
- Updated `README.md` to reflect the new provider-based architecture and usage instructions for starting different ASR services.
- Added support for the new `vibevoice` transcription provider, including configuration options for built-in speaker diarization.
- Updated `ChronicleSetup` to include `vibevoice` in the transcription provider selection and adjusted related descriptions.
- Enhanced the `ModelDef` and `Conversation` models to reflect the addition of `vibevoice` in provider options.
- Introduced a new capabilities management system to validate provider features, allowing conditional execution of tasks based on provider capabilities.
- Improved logging and user feedback in transcription and speaker recognition jobs to reflect the capabilities of the selected provider.
- Updated documentation to include details on the new `vibevoice` provider and its features.
- Introduced a new job for regenerating title and summary after memory processing to ensure fresh context is available.
- Updated the reprocess_transcript and reprocess_speakers functions to enqueue title/summary jobs based on memory job dependencies, improving job chaining and execution order.
- Enhanced validation for transcripts to account for provider capabilities, ensuring proper handling of diarization and segment data.
- Improved logging for job enqueuing and processing stages, providing clearer insights into the workflow and dependencies.
- Introduced support for Knowledge Graph functionality, enabling entity and relationship extraction from conversations using Neo4j.
- Updated `services.py` to manage Knowledge Graph profiles and integrate with existing service commands.
- Enhanced Docker Compose configurations to include Neo4j service and environment variables for Knowledge Graph setup.
- Added new API routes and models for Knowledge Graph operations, including entity and relationship management.
- Improved documentation and configuration templates to reflect the new Knowledge Graph features and setup instructions.
- Introduced new `knowledge_graph_routes.py` to handle API endpoints for managing knowledge graph entities, relationships, and promises.
- Updated `__init__.py` to include the new knowledge graph router in the main router module.
- Enhanced documentation to reflect the addition of knowledge graph functionality, improving clarity on available API routes and their purposes.
…d SDK directory

- Added entries for individual plugin config files to ensure user-specific settings are ignored.
- Included the SDK directory in .gitignore to prevent unnecessary files from being tracked.
@AnkushMalaker AnkushMalaker changed the title Feat/vibevoice asr Feat: vibevoice asr Feb 6, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This PR introduces a Knowledge Graph feature for entity extraction backed by Neo4j, refactors ASR services into a provider-based architecture (supporting VibeVoice, Faster-Whisper, Transformers, NeMo), enhances speaker recognition with segment-level identification, and significantly expands testing infrastructure with mock services and Robot Framework tests.

Changes

Cohort / File(s) Summary
Knowledge Graph Service
backends/advanced/src/advanced_omi_backend/services/knowledge_graph/*, backends/advanced/src/advanced_omi_backend/routers/modules/knowledge_graph_routes.py
New knowledge graph service with Neo4j backend, including entity/relationship/promise models, Cypher queries, LLM-based entity extraction, and comprehensive API endpoints for entity/promise/timeline management.
Knowledge Graph Initialization & Integration
backends/advanced/init.py, backends/advanced/docker-compose.yml, backends/advanced/.env.template
Adds knowledge graph setup flow, CLI flag --enable-knowledge-graph, Neo4j configuration updates, and integration into conversation processing pipeline.
ASR Provider Architecture
extras/asr-services/providers/, extras/asr-services/common/*, extras/asr-services/init.py, extras/asr-services/docker-compose.yml, extras/asr-services/pyproject.toml
Complete refactoring from Parakeet-centric to provider-based ASR with new providers (Faster-Whisper, Transformers, NeMo, VibeVoice), standardized response models, base service class, and audio utilities.
ASR Configuration & Documentation
extras/asr-services/.env.template, extras/asr-services/README.md, config/config.yml.template
Replaces Parakeet-specific configuration with provider-agnostic setup, adds provider capabilities metadata, and updates documentation to reflect provider-based architecture.
Speaker Recognition Enhancements
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py, backends/advanced/src/advanced_omi_backend/testing/mock_speaker_client.py
Adds segment-level identification methods identify_segment and identify_provider_segments with concurrent processing and majority-vote speaker mapping.
Transcription Provider Capabilities
backends/advanced/src/advanced_omi_backend/services/capabilities.py, backends/advanced/src/advanced_omi_backend/model_registry.py, backends/advanced/src/advanced_omi_backend/models/conversation.py, backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
Introduces capability system to track provider features (word_timestamps, segments, diarization), validates requirements, and controls pipeline execution.
Conversation Processing & Memory Context
backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py, backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py, backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py, backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py
Enhanced job orchestration with memory-context enrichment for summaries, provider-aware diarization logic, segment-based speaker identification, and knowledge graph extraction post-processing.
Conversation Controllers & Utilities
backends/advanced/src/advanced_omi_backend/controllers/conversation_controller.py, backends/advanced/src/advanced_omi_backend/controllers/queue_controller.py, backends/advanced/src/advanced_omi_backend/utils/conversation_utils.py, backends/advanced/src/advanced_omi_backend/utils/conversation_utils.py
Added title/summary regeneration jobs in reprocessing flow, updated dependency chains for memory-driven summaries, and integrated memory context into summary generation.
Streaming Transcription & Redis Signaling
backends/advanced/src/advanced_omi_backend/services/transcription/streaming_consumer.py
Simplified offset tracking, removed cumulative timestamp adjustments, added Redis completion signaling (transcription:complete:{session_id}), and streamlined final result storage.
API Router Integration
backends/advanced/src/advanced_omi_backend/routers/api_router.py, backends/advanced/src/advanced_omi_backend/routers/modules/__init__.py
Registered knowledge graph router alongside existing API routers for public endpoint exposure.
Frontend Knowledge Graph Components
backends/advanced/webui/src/components/knowledge-graph/*, backends/advanced/webui/src/pages/Memories.tsx, backends/advanced/webui/src/services/api.ts
New React components (EntityCard, EntityList, PromisesList) with search, filtering, and CRUD operations; updated Memories page with three tabs (Memories, Entities, Promises); extended API service with knowledge graph endpoints.
Testing Infrastructure - Mock Services
tests/libs/mock_asr_server.py, tests/libs/mock_streaming_stt_server.py, tests/scripts/mock_transcription_server.py
Added mock ASR server with provider-based responses, updated mock streaming with cumulative timestamp tracking, and mock WebSocket transcription server.
Testing Infrastructure - Robot Framework
tests/asr/*, tests/resources/asr_keywords.robot, tests/integration/websocket_transcription_e2e_test.robot
Comprehensive ASR test suites covering protocol validation, error handling, GPU integration, and VibeVoice diarization; added keyword library for ASR service interaction.
Testing Configuration & Documentation
tests/CONFIG_GUIDE.md, tests/DEBUG_GUIDE.md, tests/configs/mock-vibevoice.yml, tests/MOCK_SPEAKER_IMPLEMENTATION.md, tests/Dockerfile.mock-asr, tests/Makefile
Documentation for test configuration, debugging guide, new mock config for VibeVoice testing, mock speaker implementation details, mock ASR service container, and test targets.
Testing Utilities
tests/show_results.py, tests/run_failed_tests.sh, tests/scripts/verify_mock_servers.py, tests/.gitignore, tests/tags.md
Added test result viewer with colored output, failed test runner, mock server verification script, updated tags with GPU requirement, and ignore patterns.
Build & Deployment
services.py, wizard.py, .gitignore, backends/advanced/docker-compose-test.yml
Added knowledge graph and ASR provider selection in service orchestration; updated provider menu in wizard; updated git ignores; added mock ASR service to test compose.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Chronicle as Chronicle<br/>Backend
    participant KGService as Knowledge Graph<br/>Service
    participant LLM as LLM Client
    participant Neo4j
    participant API as Knowledge Graph<br/>API

    User->>Chronicle: Process Conversation
    Chronicle->>KGService: Extract Entities & Relationships
    KGService->>LLM: Request Entity/Promise Extraction
    LLM-->>KGService: Return JSON Structured Data
    KGService->>Neo4j: Store Entities
    KGService->>Neo4j: Store Relationships
    KGService->>Neo4j: Store Promises
    KGService->>Neo4j: Link to Conversation
    KGService-->>Chronicle: Extraction Complete

    User->>API: GET /knowledge-graph/entities
    API->>KGService: Query User Entities
    KGService->>Neo4j: Cypher Query
    Neo4j-->>KGService: Entity Rows
    KGService-->>API: Entity Objects
    API-->>User: JSON Response
Loading
sequenceDiagram
    participant Worker as Speaker Job<br/>Worker
    participant Provider as Transcription<br/>Provider
    participant Diarizer as Pyannote<br/>Diarizer
    participant Speaker as Speaker<br/>Recognition
    participant DB as Transcript DB

    Note over Worker: Check Provider Capabilities

    alt Provider Has Diarization
        Worker->>Provider: Check Segments from Provider
        Provider-->>Worker: Diarized Segments
        Worker->>Speaker: identify_provider_segments()
        Speaker->>Speaker: Group by Label
        Speaker->>Speaker: Extract Top Segments per Label
        Speaker->>Speaker: Concurrent Identification
        Speaker->>Speaker: Majority Vote Mapping
        Speaker-->>Worker: Label to Speaker Map
    else Provider Lacks Diarization
        alt Has Word Timestamps
            Worker->>Diarizer: Run Pyannote Diarization
            Diarizer-->>Worker: Diarized Segments
            Worker->>Speaker: Standard Identification
        else No Words Available
            Worker-->>Worker: Return with Existing Segments
        end
    end

    Worker->>DB: Update Transcript Version
    DB-->>Worker: Confirmation
    Worker->>Worker: Return with diarization_source
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat: vibevoice asr' accurately summarizes the main change—introducing VibeVoice as a new ASR provider.
Docstring Coverage ✅ Passed Docstring coverage is 87.36% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/vibevoice-asr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
tests/tags.md (1)

200-200: ⚠️ Potential issue | 🟡 Minor

Stale count: "14 approved tags" should be "15".

Line 200 still references 14 approved tags, but the document now defines 15 (with the new requires-gpu tag added on Line 7 and Line 272).

Proposed fix
-**DO NOT create or use any tags other than the 14 approved tags above.**
+**DO NOT create or use any tags other than the 15 approved tags above.**
backends/advanced/src/advanced_omi_backend/services/transcription/streaming_consumer.py (1)

285-289: ⚠️ Potential issue | 🟠 Major

Hardcoded b"deepgram-stream" provider name.

The provider is hardcoded as deepgram-stream even though the system now supports multiple streaming STT providers (VibeVoice, etc.). This should use the actual provider name from self.provider.

🔧 Proposed fix
             entry = {
                 b"text": result.get("text", "").encode(),
                 b"chunk_id": (chunk_id or f"final_{int(time.time() * 1000)}").encode(),
-                b"provider": b"deepgram-stream",
+                b"provider": self.provider.name.encode() if hasattr(self.provider, 'name') else b"unknown-stream",
                 b"confidence": str(result.get("confidence", 0.0)).encode(),
                 b"processing_time": b"0.0",  # Streaming has minimal processing time
                 b"timestamp": str(time.time()).encode(),
             }
tests/libs/mock_streaming_stt_server.py (1)

190-198: ⚠️ Potential issue | 🟡 Minor

Cumulative offset not updated after sending the final response, causing stale cleanup log.

At line 192, create_final_response returns new_offset, but connection_state[client_id]["cumulative_offset"] is never updated with it. The finally block at line 215 then logs a final_offset that doesn't include the final response's contribution.

This doesn't affect test correctness, but the logged offset is misleading during debugging.

Proposed fix
                         current_offset = connection_state[client_id]["cumulative_offset"]
                         final, new_offset = create_final_response(cumulative_offset=current_offset)
                         await websocket.send(json.dumps(final))
+                        connection_state[client_id]["cumulative_offset"] = new_offset
                         logger.info(f"Sent final result to {client_id} (offset: {current_offset:.2f}s → {new_offset:.2f}s): {final['channel']['alternatives'][0]['transcript']}")

Also applies to: 212-219

backends/advanced/src/advanced_omi_backend/workers/speaker_jobs.py (1)

207-355: ⚠️ Potential issue | 🟡 Minor

Fix Ruff F841/F541 in capability handling.

Unused variables and f-strings without placeholders will fail lint. Either use the capability flags or remove them, and drop unnecessary f-prefixes.

🧹 Suggested fix
-    provider_has_word_timestamps = provider_capabilities.get("word_timestamps", False)
+    # provider_has_word_timestamps = provider_capabilities.get("word_timestamps", False)

...
-            logger.warning(f"🎤 Provider claimed diarization but no segments found")
+            logger.warning("🎤 Provider claimed diarization but no segments found")

...
-    # Check if we can run pyannote diarization
-    # Pyannote requires word timestamps to align speaker segments with text
-    can_run_pyannote = bool(actual_words) and not provider_has_diarization
+    # Pyannote requires word timestamps to align speaker segments with text

...
-        logger.warning(
-            f"🎤 No word timestamps available and provider didn't diarize. "
-            f"Speaker recognition cannot improve segments."
-        )
+        logger.warning(
+            "🎤 No word timestamps available and provider didn't diarize. "
+            "Speaker recognition cannot improve segments."
+        )

...
-            logger.info(f"🎤 Using segment-level speaker identification for provider-diarized segments")
+            logger.info("🎤 Using segment-level speaker identification for provider-diarized segments")

...
-                logger.info(f"🔐 Generated backend token for speaker service")
+                logger.info("🔐 Generated backend token for speaker service")

...
-            logger.info(f"🎤 Calling speaker recognition service with conversation_id...")
+            logger.info("🎤 Calling speaker recognition service with conversation_id...")
🤖 Fix all issues with AI agents
In
`@backends/advanced/src/advanced_omi_backend/services/knowledge_graph/queries.py`:
- Around line 119-123: The DELETE queries use RETURN count(e) or count(p) after
DETACH DELETE which unbinds those variables; update the three query strings
DELETE_ENTITY, DELETE_PROMISE, and DELETE_USER_ENTITIES to return count(*)
instead (e.g., replace "RETURN count(e) as deleted_count" with "RETURN count(*)
as deleted_count" and similarly replace any "count(p)" with "count(*)") so
deletion counts reflect processed rows rather than an unbound variable.

In `@backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py`:
- Around line 250-276: The import for get_config is incorrect and will raise at
runtime; replace the import in the try block that currently references
advanced_omi_backend.model_registry with the correct module
advanced_omi_backend.config so that get_config() resolves; update the import
statement near the KG extraction logic (where get_config is called, surrounding
variables include kg_enabled, kg_service, and the call
kg_service.process_conversation that uses conversation_id, full_conversation,
user_id and conversation_model.title) to import get_config from
advanced_omi_backend.config.

In `@extras/asr-services/providers/vibevoice/transcriber.py`:
- Around line 315-321: The fallback path in _map_to_result is double-prefixing
speaker labels because _parse_vibevoice_output already returns strings like
"Speaker 0", but _map_to_result blindly wraps speaker_raw with f"Speaker
{speaker_raw}"; update _map_to_result to detect and avoid double-prefixing by:
when building speaker_id, if speaker_raw is None keep None; if speaker_raw is an
int or a numeric string format as "Speaker {n}"; if speaker_raw is a string that
already starts with "Speaker " use it as-is; otherwise convert to string without
adding an extra "Speaker " prefix. Reference _parse_vibevoice_output and
_map_to_result to locate the related logic.

In `@tests/asr/protocol_tests.robot`:
- Around line 159-174: The known capability set in the "ASR Capabilities Are
From Known Set" Robot test is out of date: update the @{known_caps} list (used
by the test that calls Get ASR Service Info and iterates over
${info}[capabilities]) to match actual provider outputs by including
language_detection, vad_filter, and translation and removing segments and
diarization so the test validates reported capabilities correctly; ensure the
test documentation string is updated to reflect the new canonical set.
🟠 Major comments (19)
extras/asr-services/providers/faster_whisper/Dockerfile-18-43 (1)

18-43: 🛠️ Refactor suggestion | 🟠 Major

Container runs as root — add a non-root user.

Trivy correctly flags that no USER directive is set. Running as root inside the container is a security risk. Add a non-root user in the runtime stage.

Proposed fix
 FROM python:3.12-slim-bookworm AS runtime
 ENV PYTHONUNBUFFERED=1
 WORKDIR /app

 # Install runtime dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
         curl ffmpeg \
-    && rm -rf /var/lib/apt/lists/*
+    && rm -rf /var/lib/apt/lists/* \
+    && groupadd --gid 1000 appuser \
+    && useradd --uid 1000 --gid appuser --no-create-home appuser

 # Copy virtual environment and application
 COPY --from=builder /app /app
 COPY common/ ./common/
 COPY providers/faster_whisper/ ./providers/faster_whisper/

+RUN chown -R appuser:appuser /app
+USER appuser
+
 ENV PATH="/app/.venv/bin:$PATH"
backends/advanced/webui/src/components/knowledge-graph/index.ts-1-5 (1)

1-5: ⚠️ Potential issue | 🟠 Major

Re-exporting Promise type propagates the global shadowing issue.

This barrel makes the shadowing from PromisesList.tsx even more widespread. Once renamed in PromisesList.tsx, update this re-export accordingly.

backends/advanced/webui/src/components/knowledge-graph/PromisesList.tsx-5-19 (1)

5-19: ⚠️ Potential issue | 🟠 Major

Exporting Promise shadows the global Promise type.

TypeScript's global Promise<T> is fundamental. Exporting an interface named Promise will shadow it in any file that imports this type, potentially causing confusing type errors when trying to use Promise<T> for async operations in the same scope.

Consider renaming to PromiseItem, KGPromise, or TaskPromise.

extras/asr-services/pyproject.toml-97-118 (1)

97-118: ⚠️ Potential issue | 🟠 Major

Remove unused diffusers dependency from both transformers and vibevoice groups.

diffusers>=0.30.0 is listed in both dependency groups but is not imported or used anywhere in the provider implementations. A full search of the codebase yields no references to diffusers. The transformers provider uses only the transformers library pipeline, and the vibevoice provider uses custom VibeVoice modules that do not depend on diffusers. This appears to be leftover from development and should be removed.

services.py-136-142 (1)

136-142: 🛠️ Refactor suggestion | 🟠 Major

provider_to_service mapping is duplicated — extract to a module-level constant.

This dict appears identically in both the build path and the run path. If a provider is added or renamed, only one copy may be updated.

♻️ Proposed refactor

Add near the top of the file (e.g., after SERVICES):

# Map ASR provider names to their docker-compose service names
ASR_PROVIDER_TO_SERVICE = {
    'vibevoice': 'vibevoice-asr',
    'faster-whisper': 'faster-whisper-asr',
    'transformers': 'transformers-asr',
    'nemo': 'nemo-asr',
    'parakeet': 'parakeet-asr',
}

Then replace both inline dicts:

-                provider_to_service = {
-                    'vibevoice': 'vibevoice-asr',
-                    'faster-whisper': 'faster-whisper-asr',
-                    'transformers': 'transformers-asr',
-                    'nemo': 'nemo-asr',
-                    'parakeet': 'parakeet-asr',
-                }
-                asr_service_to_build = provider_to_service.get(asr_provider)
+                asr_service_to_build = ASR_PROVIDER_TO_SERVICE.get(asr_provider)

(Same replacement in the run block.)

Also applies to: 267-273

extras/asr-services/providers/nemo/transcriber.py-92-96 (1)

92-96: ⚠️ Potential issue | 🟠 Major

Blocking model inference on the async event loop.

self.model.transcribe(...) is a synchronous, CPU/GPU-bound NeMo call that can take seconds to minutes. Running it directly inside an async method — even under an asyncio.Lock — blocks the entire event loop, starving all other coroutines (health checks, concurrent requests, etc.).

Offload the blocking call to a thread executor:

🐛 Proposed fix
     async with self._lock:
-        with torch.no_grad():
-            results = self.model.transcribe(
-                [audio_file_path], batch_size=1, timestamps=True
-            )
+        loop = asyncio.get_event_loop()
+        results = await loop.run_in_executor(
+            None,
+            lambda: self._transcribe_sync(audio_file_path),
+        )

Add a private helper:

def _transcribe_sync(self, audio_file_path: str):
    with torch.no_grad():
        return self.model.transcribe(
            [audio_file_path], batch_size=1, timestamps=True
        )
extras/asr-services/providers/nemo/transcriber.py-126-130 (1)

126-130: ⚠️ Potential issue | 🟠 Major

Remove segments from stt-nemo capabilities or implement segment extraction.

The NeMo transcriber declares segments as a capability in config/config.yml.template (lines 240–242) with extract mapping (line 253), but the implementation always returns segments=[] (line 129). Unlike Faster-Whisper and VibeVoice, which actively populate segments from model output, NeMo only extracts word-level timestamps. Downstream code checking capabilities via transcript.segments will incorrectly identify segments as available when they're not.

Either remove segments from the capability list or implement proper segment extraction from the NeMo model output.

backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py-393-399 (1)

393-399: ⚠️ Potential issue | 🟠 Major

Don’t hardcode user_id during segment identification.

identify_provider_segments() ignores its user_id parameter and forces "1", which risks cross-user speaker leakage and wrong mappings. Use the caller-supplied user_id.

✅ Suggested fix
-                    result = await self.identify_segment(
-                        wav_bytes, user_id="1", similarity_threshold=similarity_threshold
-                    )
+                    result = await self.identify_segment(
+                        wav_bytes,
+                        user_id=str(user_id) if user_id is not None else None,
+                        similarity_threshold=similarity_threshold,
+                    )
extras/asr-services/providers/vibevoice/transcriber.py-156-164 (1)

156-164: ⚠️ Potential issue | 🟠 Major

trust_remote_code=True is a security concern — document the justification.

Passing trust_remote_code=True to from_pretrained allows arbitrary code execution from the model repository. If this is required by the VibeVoice model, add an inline comment explaining why and ensure the model source is trusted and pinned.

backends/advanced/src/advanced_omi_backend/routers/modules/knowledge_graph_routes.py-70-75 (1)

70-75: ⚠️ Potential issue | 🟠 Major

Internal error details leak to clients via str(e) in 500 responses.

Every endpoint's catch-all handler returns f"Error ...: {str(e)}" directly to the client. This can expose stack internals, file paths, or database details. Return a generic message to the client and keep the detailed error in the server log only.

This applies across all endpoints (lines 74, 101, 143, 170, 207, 259, 300, 327, 385).

Suggested pattern (apply to all catch-all handlers)
     except Exception as e:
-        logger.error(f"Error getting entities: {e}")
+        logger.exception(f"Error getting entities: {e}")
         return JSONResponse(
             status_code=500,
-            content={"message": f"Error getting entities: {str(e)}"},
+            content={"message": "Internal server error"},
         )
extras/asr-services/init.py-158-171 (1)

158-171: ⚠️ Potential issue | 🟠 Major

String comparison for CUDA version will break for versions ≥ 12.10.

cuda_ver >= "12.8" performs lexicographic comparison. This works for the current set (12.1, 12.6, 12.8) but will misclassify future versions like "12.10" (lexicographically "12.10" < "12.8" because '1' < '8').

🐛 Proposed fix: compare as tuples of ints
                 if match:
                     major, minor = match.groups()
-                    cuda_ver = f"{major}.{minor}"
-                    if cuda_ver >= "12.8":
+                    cuda_ver = (int(major), int(minor))
+                    if cuda_ver >= (12, 8):
                         return "cu128"
-                    elif cuda_ver >= "12.6":
+                    elif cuda_ver >= (12, 6):
                         return "cu126"
-                    elif cuda_ver >= "12.1":
+                    elif cuda_ver >= (12, 1):
                         return "cu121"
extras/asr-services/providers/vibevoice/transcriber.py-85-119 (1)

85-119: ⚠️ Potential issue | 🟠 Major

Cloning an external Git repository at runtime is a security and reliability risk.

The _setup_vibevoice method runs git clone from a hardcoded GitHub URL at runtime. This introduces:

  1. A supply-chain attack vector — the remote repo content could change between deployments.
  2. A runtime failure point — network issues or GitHub outages would break model loading.
  3. sys.path manipulation makes the imported code difficult to audit.

Consider pinning to a specific commit hash and, ideally, vendoring or packaging the dependency in the Docker image at build time rather than cloning at runtime.

🔒 Suggested improvement: pin to a commit hash
             subprocess.run(
                 [
                     "git",
                     "clone",
+                    "--branch", "v1.0",  # or use a specific tag
                     "https://github.com/microsoft/VibeVoice.git",
                     str(vibevoice_dir),
                 ],
                 check=True,
             )

Better yet, add the clone step to the Dockerfile so it's done at build time with a pinned commit:

RUN git clone --depth 1 https://github.com/microsoft/VibeVoice.git /models/vibevoice \
    && cd /models/vibevoice && git checkout <pinned-commit-sha>
extras/asr-services/init.py-313-319 (1)

313-319: ⚠️ Potential issue | 🟠 Major

Configuration mismatch: TORCH_DTYPE and attention settings don't align with the transcriber.

Two issues:

  1. TORCH_DTYPE is set to "float16" here, but the transcriber's docstring and default recommend bfloat16 for VibeVoice.
  2. USE_FLASH_ATTENTION is set, but the transcriber reads VIBEVOICE_ATTN_IMPL (expecting "flash_attention_2", "sdpa", or "eager"). This env var will be ignored.
🐛 Proposed fix
         elif provider == "vibevoice":
-            # VibeVoice uses transformers backend with specific optimizations
-            self.config["TORCH_DTYPE"] = "float16"
+            self.config["TORCH_DTYPE"] = "bfloat16"
             self.config["DEVICE"] = "cuda"
-            self.config["USE_FLASH_ATTENTION"] = "true"
-            self.console.print("[blue][INFO][/blue] Enabled Flash Attention for VibeVoice")
+            self.config["VIBEVOICE_ATTN_IMPL"] = "flash_attention_2"
+            self.console.print("[blue][INFO][/blue] Enabled Flash Attention 2 for VibeVoice")
backends/advanced/src/advanced_omi_backend/services/capabilities.py-88-92 (1)

88-92: ⚠️ Potential issue | 🟠 Major

diarization_source is a direct field on TranscriptVersion, not inside metadata.

The TranscriptVersion model defines diarization_source as a top-level field, not a key within the metadata dict. Line 90 checks the wrong location, so it will never find the diarization source and cause check_requirements to incorrectly report missing diarization data. This is evident from other usages throughout the codebase (e.g., speaker_jobs.py, conversation_controller.py) which all access it as a direct field.

Proposed fix
     # Check for diarization source in metadata
-    diarization_source = transcript.metadata.get("diarization_source")
+    diarization_source = transcript.diarization_source
     if diarization_source:
         available.add(FeatureRequirement.DIARIZATION)
extras/asr-services/providers/vibevoice/service.py-71-77 (1)

71-77: ⚠️ Potential issue | 🟠 Major

Inconsistent capability declarations between service.py and init.py PROVIDERS — align before merging.

The get_capabilities() method returns ["timestamps", "diarization", "speaker_identification", "long_form"] but the PROVIDERS registry in init.py (line 42) declares ["segments", "diarization", "timestamps"]. This mismatch can cause capability-check failures in the pipeline. The transcriber (line 366) confirms words=[] with "doesn't provide word-level timestamps", which aligns with init.py's comment but contradicts service.py's declared capabilities. Resolve this by ensuring both sources list the same capabilities.

backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-186-208 (1)

186-208: ⚠️ Potential issue | 🟠 Major

All async def methods call synchronous Neo4j driver operations, blocking the event loop.

self._write.run(...) and self._read.run(...) are synchronous (per the Neo4jClient / Neo4jWriteInterface snippets). Wrapping them in async def does not make them non-blocking — the event loop will stall on every Neo4j call. Since this service is wired into FastAPI routes, this will degrade request throughput.

Either:

  1. Run sync calls in a thread executor: await asyncio.to_thread(self._write.run, query, **params), or
  2. Switch to the async Neo4j driver (neo4j.AsyncGraphDatabase), or
  3. Drop async from these methods and let FastAPI handle the thread offloading via def endpoints.

This affects every method in the class (_create_conversation_entity, _store_entities, _store_relationships, _store_promises, _link_entities_to_conversation, get_entities, get_entity, etc.).

Also applies to: 210-261

backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-701-718 (1)

701-718: ⚠️ Potential issue | 🟠 Major

_row_to_entity will crash on unknown entity types, unlike the sibling converters.

_row_to_relationship (Line 723) and _row_to_promise (Line 747) wrap the enum conversion in try/except ValueError with a fallback, but _row_to_entity at Line 706 does not. If Neo4j contains an entity with a type not in the EntityType enum, this raises an unhandled ValueError and breaks the entire query result.

-        return Entity(
-            ...
-            type=EntityType(data.get("type", "thing")),
+        entity_type_str = data.get("type", "thing")
+        try:
+            entity_type = EntityType(entity_type_str)
+        except ValueError:
+            entity_type = EntityType.THING
+
+        return Entity(
+            ...
+            type=entity_type,
backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-602-630 (1)

602-630: ⚠️ Potential issue | 🟠 Major

update_promise_status does not set completed_at when transitioning to COMPLETED.

When a promise's status changes to "completed", the completed_at field should be populated with the current timestamp. Currently, the service passes only the new status to the query, leaving completed_at as None.

Also, the incoming status string is not validated against PromiseStatus before writing. An invalid value will be stored silently.

🛡️ Proposed fix
     async def update_promise_status(
         self,
         promise_id: str,
         user_id: str,
         status: str,
     ) -> Optional[Promise]:
         self._ensure_initialized()
 
+        # Validate status
+        try:
+            PromiseStatus(status)
+        except ValueError:
+            logger.warning(f"Invalid promise status: {status}")
+            return None
+
+        completed_at = datetime.utcnow().isoformat() if status == PromiseStatus.COMPLETED.value else None
+
         results = self._write.run(
             queries.UPDATE_PROMISE_STATUS,
             id=promise_id,
             user_id=user_id,
             status=status,
+            completed_at=completed_at,
         )
backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-78-88 (1)

78-88: ⚠️ Potential issue | 🟠 Major

_ensure_initialized is not thread-/coroutine-safe — concurrent callers can create duplicate clients.

Multiple async callers (or threads via the global singleton) can enter _ensure_initialized concurrently, both see _initialized == False, and each create a separate Neo4jClient. Only the last one survives on self._client, leaking the earlier driver connection. Consider guarding with a lock (or asyncio.Lock if staying async).

🔒 Proposed fix
+import asyncio
+
 class KnowledgeGraphService:
+    _init_lock = asyncio.Lock()
     ...
-    def _ensure_initialized(self) -> None:
-        """Ensure Neo4j client is initialized."""
-        if not self._initialized:
+    async def _ensure_initialized(self) -> None:
+        """Ensure Neo4j client is initialized."""
+        if self._initialized:
+            return
+        async with self._init_lock:
+            if self._initialized:
+                return
             self._client = Neo4jClient(
                 uri=self.neo4j_uri,
                 user=self.neo4j_user,
                 password=self.neo4j_password,
             )
             self._read = Neo4jReadInterface(self._client)
             self._write = Neo4jWriteInterface(self._client)
             self._initialized = True
             logger.info("Knowledge Graph Service initialized with Neo4j connection")
🟡 Minor comments (45)
tests/DEBUG_GUIDE.md-22-22 (1)

22-22: ⚠️ Potential issue | 🟡 Minor

Fix typo in example output.

“acces...” should be “access” to avoid user-facing doc errors.

tests/DEBUG_GUIDE.md-10-10 (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Use consistent output file naming to avoid broken commands.

Line 10 references results/output.xml.txt, but later sections use results/output.xml and <output-file>. Align these to one canonical filename so copy/paste works.

extras/asr-services/providers/__init__.py-1-8 (1)

1-8: ⚠️ Potential issue | 🟡 Minor

Docstring likely outdated—include VibeVoice or clarify scope.

The module-level “Available providers” list omits VibeVoice, which the PR objectives say is now supported. Please update the list (or add a clarifying note if this module intentionally only documents a subset).

tests/integration/websocket_transcription_e2e_test.robot-244-312 (1)

244-312: ⚠️ Potential issue | 🟡 Minor

Test name/docs say "Word Timestamps" but implementation checks segment-level timestamps.

The test is named "Word Timestamps Are Monotonically Increasing" and the documentation references "word timestamps," but the implementation at lines 285-304 iterates over ${conversation}[segments] and checks segment-level start/end times. If the intent is to validate word-level timestamps (e.g., per-word timing within segments), this test doesn't cover that. If segment-level is the actual intent, consider renaming to "Segment Timestamps Are Monotonically Increasing" for clarity.

tests/Makefile-14-14 (1)

14-14: ⚠️ Potential issue | 🟡 Minor

Adding asr to TEST_DIR exposes GPU-tagged tests in targets that lack --exclude requires-gpu.

The all target correctly excludes requires-gpu, but several other targets that consume $(TEST_DIR) do not:

  • test-no-api (line 269) — CI mode, which typically runs on GPU-less infrastructure
  • test-with-api-keys (line 252)
  • test-all-with-slow-and-sdk (line 285)
  • test-slow (line 231)

Two ASR test files are tagged with requires-gpu: gpu_integration_tests.robot and vibevoice_diarization_test.robot. These will be picked up and fail on machines without a GPU when running the above targets. Add --exclude requires-gpu to at least test-no-api (CI mode).

extras/asr-services/providers/vibevoice/impl.md-1-17 (1)

1-17: ⚠️ Potential issue | 🟡 Minor

Raw transcription notes committed as documentation.

This file reads like unedited voice-to-text output (e.g., "make sureWe", "diarisation.So"). If this is intended as persistent design documentation, consider cleaning up formatting and grammar. If it's temporary scratch notes, consider keeping it out of version control or placing it in an untracked/ directory.

tests/run_failed_tests.sh-33-33 (1)

33-33: ⚠️ Potential issue | 🟡 Minor

Hardcoded total count will drift if tests are added or removed.

Use ${#tests[@]} instead of the literal 6 to keep the progress indicator in sync with the array.

Proposed fix
-    echo "Test $test_count/6: $test_name"
+    echo "Test $test_count/${`#tests`[@]}: $test_name"
tests/CONFIG_GUIDE.md-167-169 (1)

167-169: ⚠️ Potential issue | 🟡 Minor

Add a language specifier to the fenced code block.

The code block at line 167 is missing a language identifier, which triggers markdownlint MD040. Use ```text for plain output blocks.

🔧 Proposed fix
 Expected output:
-```
+```text
 CONFIG_FILE=/app/test-configs/mock-services.yml
</details>

</blockquote></details>
<details>
<summary>extras/asr-services/common/response_models.py-63-76 (1)</summary><blockquote>

`63-76`: _⚠️ Potential issue_ | _🟡 Minor_

**`to_dict()` docstring says "excluding None values" but nested `model_dump()` includes them.**

`model_dump()` on `Word` and `Segment` will include `confidence: null` and `speaker: null` respectively. The top-level fields (`speakers`, `language`, `duration`) are correctly excluded when `None`, but the nested objects are not. Either update the docstring or pass `exclude_none=True` to `model_dump()`:


<details>
<summary>Proposed fix — exclude None in nested models</summary>

```diff
     def to_dict(self) -> dict:
         """Convert to dictionary, excluding None values."""
         result = {
             "text": self.text,
-            "words": [w.model_dump() for w in self.words],
-            "segments": [s.model_dump() for s in self.segments],
+            "words": [w.model_dump(exclude_none=True) for w in self.words],
+            "segments": [s.model_dump(exclude_none=True) for s in self.segments],
         }
         if self.speakers is not None:
-            result["speakers"] = [s.model_dump() for s in self.speakers]
+            result["speakers"] = [s.model_dump(exclude_none=True) for s in self.speakers]

Alternatively, if consumers rely on the null keys being present, update the docstring instead.

extras/asr-services/providers/transformers/transcriber.py-92-93 (1)

92-93: ⚠️ Potential issue | 🟡 Minor

Device check is too narrow — misses cuda:N variants.

self.device could be set to "cuda:0", "cuda:1", etc. via the DEVICE env var. The equality check == "cuda" would skip .to(), leaving the model on CPU.

Proposed fix
-        if self.device == "cuda":
-            self.model = self.model.to(self.device)
+        if self.device != "cpu":
+            self.model = self.model.to(self.device)
backends/advanced/webui/src/components/knowledge-graph/EntityCard.tsx-46-53 (1)

46-53: ⚠️ Potential issue | 🟡 Minor

formatDate try/catch doesn't catch invalid dates.

new Date("garbage") doesn't throw — it returns an Invalid Date object, and toLocaleDateString() will return the string "Invalid Date" rather than throwing. The catch block is unreachable for malformed date strings.

Proposed fix
   const formatDate = (dateStr?: string) => {
     if (!dateStr) return null
-    try {
-      return new Date(dateStr).toLocaleDateString()
-    } catch {
-      return null
-    }
+    const date = new Date(dateStr)
+    if (isNaN(date.getTime())) return null
+    return date.toLocaleDateString()
   }
tests/MOCK_SPEAKER_IMPLEMENTATION.md-204-204 (1)

204-204: ⚠️ Potential issue | 🟡 Minor

Hard-coded local filesystem path in documentation.

/home/ankush/workspaces/friend-lite/tests/TODO_MOCK_SPEAKER_RECOGNITION.md is a developer-specific absolute path that won't resolve for other contributors. Use a repository-relative path instead.

Proposed fix
-- **Plan**: See `/home/ankush/workspaces/friend-lite/tests/TODO_MOCK_SPEAKER_RECOGNITION.md` for detailed implementation plan
+- **Plan**: See `tests/TODO_MOCK_SPEAKER_RECOGNITION.md` for detailed implementation plan
extras/asr-services/providers/transformers/transcriber.py-164-173 (1)

164-173: ⚠️ Potential issue | 🟡 Minor

Single-segment output loses granularity when return_timestamps=False.

When return_timestamps=False, all_words is empty so the segment is created with start=0.0, end=0.0, which is misleading. This is a minor edge case since return_timestamps defaults to True, but worth a guard:

Proposed fix
         # Create single segment if we have text
         if text:
-            end_time = all_words[-1].end if all_words else 0.0
+            end_time = all_words[-1].end if all_words else None
             all_segments.append(
                 Segment(
                     text=text,
                     start=0.0,
-                    end=end_time,
+                    end=end_time if end_time is not None else 0.0,
                 )
             )

Or, if you have the audio duration available, use that as the segment end time when word timestamps aren't present.

extras/asr-services/providers/faster_whisper/transcriber.py-39-40 (1)

39-40: ⚠️ Potential issue | 🟡 Minor

Device defaults to "cuda" with no GPU availability check.

Unlike the Transformers transcriber which checks torch.cuda.is_available(), this defaults to "cuda" unconditionally. If deployed on a CPU-only machine without setting DEVICE=cpu, WhisperModel will fail at load time with a CUDA error.

Proposed fix — add a fallback
+        import torch
+
         self.compute_type = os.getenv("COMPUTE_TYPE", "float16")
-        self.device = os.getenv("DEVICE", "cuda")
+        self.device = os.getenv("DEVICE", "cuda" if torch.cuda.is_available() else "cpu")

Note: this adds a torch dependency. If you want to keep torch out of this provider, an alternative is to attempt a CUDA check via ctranslate2 or simply document that DEVICE must be set explicitly for CPU deployments.

tests/libs/mock_asr_server.py-40-40 (1)

40-40: ⚠️ Potential issue | 🟡 Minor

Inconsistent default provider: env var defaults to "parakeet" but CLI defaults to "mock".

Line 40 sets PROVIDER_MODE to "parakeet" when MOCK_ASR_PROVIDER is unset, while the CLI --provider flag on Line 221 defaults to "mock". Running the server via the module entry point vs. importing the app directly will yield different behaviors. Consider aligning both defaults.

Proposed fix — align env var default to "mock"
-PROVIDER_MODE = os.environ.get("MOCK_ASR_PROVIDER", "parakeet")
+PROVIDER_MODE = os.environ.get("MOCK_ASR_PROVIDER", "mock")

Also applies to: 218-222

extras/asr-services/common/base_service.py-156-157 (1)

156-157: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix — no placeholders in the string.

Line 157 has an f-string with no interpolation.

Proposed fix
-        logger.info(f"Transcription request started")
+        logger.info("Transcription request started")
backends/advanced/webui/src/components/knowledge-graph/EntityList.tsx-198-206 (1)

198-206: ⚠️ Potential issue | 🟡 Minor

Naive pluralization: {type}s produces "persons" instead of "People".

The entityTypes array already has proper labels (e.g., 'People', 'Organizations'). The grouped section headings should use those labels instead of appending "s" to the raw type string.

Proposed fix
                <h3 className="flex items-center space-x-2 text-lg font-semibold text-gray-900 dark:text-gray-100 mb-3 capitalize">
                  {entityTypes.find((t) => t.value === type)?.icon}
-                  <span>{type}s</span>
+                  <span>{entityTypes.find((t) => t.value === type)?.label ?? `${type}s`}</span>
                   <span className="text-sm font-normal text-gray-500 dark:text-gray-400">
extras/asr-services/common/base_service.py-170-175 (1)

170-175: ⚠️ Potential issue | 🟡 Minor

Extension extraction is incorrect when the filename has no dot.

If file.filename is "audiofile" (no extension), rsplit(".", 1) returns ["audiofile"], so [-1] yields "audiofile", which won't match the allow-list and silently falls through — that's fine. But if a filename somehow equals an allowed extension (e.g., "wav"), it would incorrectly match. Safer to check that a split actually occurred.

Proposed fix
             if file.filename:
-                ext = file.filename.rsplit(".", 1)[-1].lower()
-                if ext in ("wav", "mp3", "flac", "ogg", "m4a"):
-                    suffix = f".{ext}"
+                parts = file.filename.rsplit(".", 1)
+                if len(parts) == 2:
+                    ext = parts[1].lower()
+                    if ext in ("wav", "mp3", "flac", "ogg", "m4a"):
+                        suffix = f".{ext}"
tests/show_results.py-189-191 (1)

189-191: ⚠️ Potential issue | 🟡 Minor

relative_to(Path.cwd()) raises ValueError if the file is outside the working directory.

If the script is invoked from a directory that is not an ancestor of the results file, this will crash.

Proposed fix
-    print(f"Results from: {colorize(str(xml_file.relative_to(Path.cwd())), Colors.BLUE)}")
+    try:
+        display_path = str(xml_file.relative_to(Path.cwd()))
+    except ValueError:
+        display_path = str(xml_file)
+    print(f"Results from: {colorize(display_path, Colors.BLUE)}")
backends/advanced/webui/src/components/knowledge-graph/PromisesList.tsx-96-106 (1)

96-106: ⚠️ Potential issue | 🟡 Minor

Stale closure: promises may be outdated when setPromises runs.

setPromises(promises.filter(...)) captures promises at render time. Rapid consecutive deletes can cause one delete to "undo" another.

Proposed fix
-      setPromises(promises.filter((p) => p.id !== promiseId))
+      setPromises((prev) => prev.filter((p) => p.id !== promiseId))
extras/asr-services/common/audio_utils.py-50-70 (1)

50-70: ⚠️ Potential issue | 🟡 Minor

Silent overflow when input values exceed [-1.0, 1.0] range.

(audio_array * np.iinfo(np.int16).max).astype(np.int16) will wrap around on values outside the normalized range (e.g., from gain or accumulation). Adding np.clip before the cast prevents distortion artifacts.

Proposed fix
     if sample_width == 2:
-        audio_int = (audio_array * np.iinfo(np.int16).max).astype(np.int16)
+        clipped = np.clip(audio_array, -1.0, 1.0)
+        audio_int = (clipped * np.iinfo(np.int16).max).astype(np.int16)
         return audio_int.tobytes()
     elif sample_width == 4:
-        audio_int = (audio_array * np.iinfo(np.int32).max).astype(np.int32)
+        clipped = np.clip(audio_array, -1.0, 1.0)
+        audio_int = (clipped * np.iinfo(np.int32).max).astype(np.int32)
         return audio_int.tobytes()
backends/advanced/webui/src/pages/Memories.tsx-27-27 (1)

27-27: ⚠️ Potential issue | 🟡 Minor

Unsafe cast: invalid tab query parameter renders a blank content area.

searchParams.get('tab') as Tab doesn't validate the value. A URL like ?tab=bogus sets activeTab to 'bogus', and since none of the conditional renders (lines 318, 327, 339) match, the user sees an empty page below the tabs.

Proposed fix
+  const validTabs: Tab[] = ['memories', 'entities', 'promises']
+  const rawTab = searchParams.get('tab')
-  const [activeTab, setActiveTab] = useState<Tab>((searchParams.get('tab') as Tab) || 'memories')
+  const [activeTab, setActiveTab] = useState<Tab>(
+    rawTab && validTabs.includes(rawTab as Tab) ? (rawTab as Tab) : 'memories'
+  )
tests/show_results.py-152-162 (1)

152-162: ⚠️ Potential issue | 🟡 Minor

find_parent_suite may return an ancestor instead of the direct parent, producing incorrect suite paths.

suite.findall(".//test") at line 159 recursively searches all descendant tests, so a grandparent suite will also match. Since the outer for suite in root.findall(".//suite") iterates in document order, the first match could be a higher-level ancestor rather than the immediate parent. This causes intermediate suite levels to be skipped in the breadcrumb path built at lines 114-122.

A simpler approach is to build a parent map once before iterating failed tests.

Proposed fix using a parent map
+def _build_parent_map(root):
+    """Build a map of element -> parent for suite/test hierarchy."""
+    parent_map = {}
+    for suite in root.iter('suite'):
+        for child in suite:
+            parent_map[child] = suite
+    return parent_map
+
+
 def parse_results(xml_file):
     ...
     tree = ET.parse(xml_file)
     root = tree.getroot()
+    parent_map = _build_parent_map(root)
     ...
     # Then replace find_parent_suite(root, parent) with:
-                parent = find_parent_suite(root, parent)
+                parent = parent_map.get(parent)
extras/asr-services/common/audio_utils.py-22-47 (1)

22-47: ⚠️ Potential issue | 🟡 Minor

sample_rate and channels parameters are accepted but silently ignored.

A caller passing channels=2 would expect mono conversion or at least deinterleaving, but the function returns raw interleaved samples. This is a correctness trap for standalone callers (outside load_audio_file which handles this separately).

Either remove the unused parameters or implement the channel/rate handling here. If keeping for API consistency, document explicitly that the caller is responsible for mono conversion and resampling.

Option A: Remove unused params
 def convert_audio_to_numpy(
     audio_bytes: bytes,
-    sample_rate: int,
     sample_width: int = 2,
-    channels: int = 1,
 ) -> np.ndarray:

Then update load_audio_file to not pass them.

Option B: Use them
 def convert_audio_to_numpy(
     audio_bytes: bytes,
     sample_rate: int,
     sample_width: int = 2,
     channels: int = 1,
 ) -> np.ndarray:
     ...
     if sample_width == 2:
         audio_array = np.frombuffer(audio_bytes, dtype=np.int16)
-        return audio_array.astype(np.float32) / np.iinfo(np.int16).max
+        audio_array = audio_array.astype(np.float32) / np.iinfo(np.int16).max
     elif sample_width == 4:
         audio_array = np.frombuffer(audio_bytes, dtype=np.int32)
-        return audio_array.astype(np.float32) / np.iinfo(np.int32).max
+        audio_array = audio_array.astype(np.float32) / np.iinfo(np.int32).max
     else:
         raise ValueError(f"Unsupported sample width: {sample_width}")
+
+    if channels > 1:
+        audio_array = convert_to_mono(audio_array, channels)
+    return audio_array
extras/asr-services/providers/faster_whisper/service.py-7-12 (1)

7-12: ⚠️ Potential issue | 🟡 Minor

Unused tempfile import.

tempfile is imported but never used in this file — temp file handling is delegated to save_to_temp_wav.

🧹 Proposed fix
 import argparse
 import asyncio
 import logging
 import os
-import tempfile
 from typing import Optional
extras/asr-services/providers/transformers/service.py-7-12 (1)

7-12: ⚠️ Potential issue | 🟡 Minor

Unused tempfile import (same as in faster_whisper/service.py).

🧹 Proposed fix
 import argparse
 import asyncio
 import logging
 import os
-import tempfile
 from typing import Optional
tests/resources/asr_keywords.robot-87-103 (1)

87-103: ⚠️ Potential issue | 🟡 Minor

Dead code: @{env_vars} list is built but never used.

Lines 88–92 construct an @{env_vars} list that is never passed to Run Process. The actual environment variables are passed directly via env: keyword arguments on Lines 102–103. Remove the unused list construction to avoid confusion.

🧹 Proposed fix
     ${asr_dir}=    Set Variable    ${CURDIR}/../../extras/asr-services
 
-    # Build environment variables
-    @{env_vars}=    Create List
-    IF    "${model}" != "${EMPTY}"
-        Append To List    ${env_vars}    ASR_MODEL=${model}
-    END
-    Append To List    ${env_vars}    ASR_PORT=${port}
-
     Log To Console    \n🚀 Starting ${service} on port ${port}...
wizard.py-501-503 (1)

501-503: ⚠️ Potential issue | 🟡 Minor

capitalize() produces "Vibevoice" instead of "VibeVoice".

"vibevoice".capitalize()"Vibevoice", which doesn't match the product name "VibeVoice". Consider using a lookup for display names.

♻️ Suggested fix
+    _PROVIDER_DISPLAY = {"parakeet": "Parakeet", "vibevoice": "VibeVoice"}
     # Auto-add asr-services if local ASR was chosen (Parakeet or VibeVoice)
     if transcription_provider in ("parakeet", "vibevoice") and 'asr-services' not in selected_services:
-        console.print(f"[blue][INFO][/blue] Auto-adding ASR services for {transcription_provider.capitalize()} transcription")
+        console.print(f"[blue][INFO][/blue] Auto-adding ASR services for {_PROVIDER_DISPLAY.get(transcription_provider, transcription_provider)} transcription")
         selected_services.append('asr-services')
extras/asr-services/providers/nemo/Dockerfile-23-29 (1)

23-29: ⚠️ Potential issue | 🟡 Minor

Remove build-essential from the runtime stage—it's unnecessary and increases the attack surface.

The builder stage compiles all dependencies and packages them into a venv. The runtime stage copies this pre-built venv without any need for compilation tools. NeMo's model loading (via ASRModel.from_pretrained()) and transcription (via model.transcribe()) only use pre-compiled packages and pre-trained weights—no runtime compilation occurs. Keep only libsndfile1, portaudio19-dev, and curl.

Suggested fix
 RUN apt-get update && apt-get install -y --no-install-recommends \
-        libsndfile1 build-essential portaudio19-dev curl \
+        libsndfile1 portaudio19-dev curl \
     && rm -rf /var/lib/apt/lists/*
services.py-127-152 (1)

127-152: ⚠️ Potential issue | 🟡 Minor

Inconsistent fallback between build and up when ASR_PROVIDER is unrecognized.

During build (line 143), if the provider isn't in the map, asr_service_to_build is None and build runs without specifying a service — building all services in the compose file. During up (line 284), the fallback explicitly starts only vibevoice-asr. This means an unrecognized or empty provider builds everything but starts only vibevoice.

Consider adding a consistent fallback in the build path too, e.g.:

                 asr_service_to_build = provider_to_service.get(asr_provider)
+                if not asr_service_to_build:
+                    asr_service_to_build = 'vibevoice-asr'  # Same default as 'up'

Also applies to: 279-284

config/config.yml.template-102-103 (1)

102-103: ⚠️ Potential issue | 🟡 Minor

Inconsistent default host across STT providers.

stt-parakeet-batch (line 102) defaults to 172.17.0.1:8767 while all new providers (lines 163, 187, 211, 235) default to host.docker.internal:8767. The former is Linux-specific (Docker bridge gateway), the latter works on Docker Desktop (macOS/Windows) but not on all Linux setups. Consider aligning on one default or documenting the difference.

Also applies to: 163-163, 187-187, 211-211, 235-235

backends/advanced/src/advanced_omi_backend/workers/conversation_jobs.py-835-835 (1)

835-835: ⚠️ Potential issue | 🟡 Minor

Remove f-prefix to satisfy Ruff F541.

This f-string has no placeholders and will fail lint.

🧹 Suggested fix
-                logger.info(f"📚 No memories found for context enrichment")
+                logger.info("📚 No memories found for context enrichment")
backends/advanced/src/advanced_omi_backend/workers/transcription_jobs.py-452-454 (1)

452-454: ⚠️ Potential issue | 🟡 Minor

Remove f-prefix to satisfy Ruff F541.

This f-string has no placeholders and will fail lint.

🧹 Suggested fix
-        logger.info(
-            f"📊 Created fallback segment (speaker recognition disabled, no provider segments)"
-        )
+        logger.info(
+            "📊 Created fallback segment (speaker recognition disabled, no provider segments)"
+        )
tests/asr/gpu_integration_tests.robot-132-149 (1)

132-149: ⚠️ Potential issue | 🟡 Minor

Fail fast when word timestamps are missing.

The duration check silently passes when word_count == 0, which can mask regressions in word timestamps. Add a hard assertion (or explicit skip) before using the last word.

🛠️ Suggested fix
     ${response}=    Upload Audio For ASR Transcription    ${TEST_AUDIO_FILE}    ${GPU_ASR_URL}
     ${json}=    Set Variable    ${response.json()}
 
     # Get last word timestamp (approximate audio duration)
     ${word_count}=    Get Length    ${json}[words]
-    IF    ${word_count} > 0
-        ${last_word}=    Get From List    ${json}[words]    -1
-        ${transcription_duration}=    Set Variable    ${last_word}[end]
+    Should Be True    ${word_count} > 0    Word timestamps required for duration check
+    ${last_word}=    Get From List    ${json}[words]    -1
+    ${transcription_duration}=    Set Variable    ${last_word}[end]
 
-        # For a 1-minute audio file, transcription should show reasonable duration
-        # Allow some flexibility (at least 30 seconds, no more than 90 seconds)
-        Should Be True    ${transcription_duration} > 30
-        ...    Transcription duration (${transcription_duration}s) should be > 30s for 1-min audio
-        Should Be True    ${transcription_duration} < 90
-        ...    Transcription duration (${transcription_duration}s) should be < 90s for 1-min audio
+    # For a 1-minute audio file, transcription should show reasonable duration
+    # Allow some flexibility (at least 30 seconds, no more than 90 seconds)
+    Should Be True    ${transcription_duration} > 30
+    ...    Transcription duration (${transcription_duration}s) should be > 30s for 1-min audio
+    Should Be True    ${transcription_duration} < 90
+    ...    Transcription duration (${transcription_duration}s) should be < 90s for 1-min audio
 
-        Log    Transcription duration: ${transcription_duration} seconds
-    END
+    Log    Transcription duration: ${transcription_duration} seconds
backends/advanced/init.py-281-283 (1)

281-283: ⚠️ Potential issue | 🟡 Minor

Selecting "None" leaves the default STT provider active, causing potential runtime failures.

When choice "4" is selected, defaults.stt is not updated and remains "stt-deepgram". The system will attempt transcription at runtime using the default provider, which will fail without API keys. Unlike other setup options, this branch does not call update_config_defaults(). Consider either: (1) setting defaults.stt to a no-op/disabled value, (2) explicitly clearing/nullifying the default, or (3) clarifying in the message that transcription remains configured with the existing default.

extras/asr-services/init.py-438-445 (1)

438-445: ⚠️ Potential issue | 🟡 Minor

CUDA setup is skipped for faster-whisper and vibevoice, but both set DEVICE=cuda.

setup_cuda_version is only called for nemo and transformers. However, setup_provider_config sets DEVICE=cuda for both faster-whisper (line 304) and vibevoice (line 316). If the user doesn't have CUDA or needs a specific PyTorch CUDA build, this could cause runtime failures for those providers.

extras/asr-services/providers/vibevoice/transcriber.py-261-263 (1)

261-263: ⚠️ Potential issue | 🟡 Minor

Logging raw model output (up to 1000 chars) leaks transcription content into logs.

Lines 262–263 log the first and last 500 characters of the raw transcription output at INFO level. This could contain sensitive/private speech content. Downgrade to DEBUG or remove.

Suggested fix
-        logger.info(f"Raw output preview (first 500 chars): {raw_output[:500]}")
-        logger.info(f"Raw output preview (last 500 chars): {raw_output[-500:]}")
+        logger.debug(f"Raw output preview (first 500 chars): {raw_output[:500]}")
+        logger.debug(f"Raw output preview (last 500 chars): {raw_output[-500:]}")
extras/asr-services/providers/vibevoice/transcriber.py-265-274 (1)

265-274: ⚠️ Potential issue | 🟡 Minor

Greedy regex fallback can match unintended content.

Line 268's r'\[\s*\{.*\}\s*\]' with re.DOTALL is correct for the primary pattern, but the fallback at line 274 (r'\[.*\]' with re.DOTALL) will greedily match from the first [ to the last ] in the entire output, potentially capturing non-JSON content. This could cause json.loads to fail or parse unexpected data.

Suggested fix: use non-greedy quantifier
-            json_match = re.search(r'\[.*\]', raw_output, re.DOTALL)
+            json_match = re.search(r'\[.*?\]', raw_output, re.DOTALL)
extras/asr-services/init.py-420-431 (1)

420-431: ⚠️ Potential issue | 🟡 Minor

Stale references to parakeet-era naming.

Lines 421 and 430 still reference parakeet.env and PARAKEET_ASR_URL, which are artifacts from the pre-provider architecture. These are confusing in a provider-agnostic setup flow.

Suggested fix
         self.console.print("2. Or use a pre-configured profile:")
-        self.console.print("   [cyan]cp configs/parakeet.env .env && docker compose up --build -d nemo-asr[/cyan]")
+        self.console.print(f"   [cyan]cp configs/{provider}.env .env && docker compose up --build -d {service_name}[/cyan]")
         self.console.print()
         ...
         self.console.print("5. Configure Chronicle backend:")
-        self.console.print(f"   Set PARAKEET_ASR_URL=http://host.docker.internal:{self.config.get('ASR_PORT', '8767')}")
+        self.console.print(f"   Set ASR_URL=http://host.docker.internal:{self.config.get('ASR_PORT', '8767')}")
backends/advanced/src/advanced_omi_backend/services/knowledge_graph/queries.py-303-309 (1)

303-309: ⚠️ Potential issue | 🟡 Minor

LIMIT $limit has no effect on an aggregated single-row result.

collect() aggregates all nodes/edges into a single row, so LIMIT $limit always returns that one row regardless of the limit value. If the intent is to cap the number of nodes returned, the limit needs to be applied before aggregation.

Suggested fix
 GET_USER_GRAPH = """
 MATCH (e:Entity {user_id: $user_id})
+WITH e
+ORDER BY e.updated_at DESC
+LIMIT $limit
+WITH collect(e) as limited_nodes
+UNWIND limited_nodes as e
 OPTIONAL MATCH (e)-[r]->(e2:Entity {user_id: $user_id})
-WITH collect(DISTINCT e) as nodes, collect(DISTINCT {source: startNode(r).id, target: endNode(r).id, type: type(r), properties: properties(r)}) as edges
+WHERE e2 IN limited_nodes
+WITH limited_nodes as nodes, collect(DISTINCT {source: startNode(r).id, target: endNode(r).id, type: type(r), properties: properties(r)}) as edges
 RETURN nodes, edges
-LIMIT $limit
 """
backends/advanced/src/advanced_omi_backend/routers/modules/knowledge_graph_routes.py-43-51 (1)

43-51: ⚠️ Potential issue | 🟡 Minor

entity_type query parameter is not validated against EntityType enum.

An invalid entity_type value (e.g., "banana") will be passed through to the service and Neo4j, silently returning zero results instead of a 400 error. Consider validating it against the enum.

Suggested fix
+from advanced_omi_backend.services.knowledge_graph import EntityType
+
+VALID_ENTITY_TYPES = {e.value for e in EntityType}
+
 `@router.get`("/entities")
 async def get_entities(
     current_user: User = Depends(current_active_user),
     entity_type: Optional[str] = Query(
         default=None,
         description="Filter by entity type (person, place, organization, event, thing)",
     ),
     limit: int = Query(default=100, ge=1, le=500),
 ):
+    if entity_type and entity_type not in VALID_ENTITY_TYPES:
+        raise HTTPException(status_code=400, detail=f"Invalid entity type. Must be one of: {sorted(VALID_ENTITY_TYPES)}")
backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-70-71 (1)

70-71: ⚠️ Potential issue | 🟡 Minor

Default Neo4j password "password" is an insecure fallback.

If NEO4J_PASSWORD is unset, the service silently connects with the well-known default credential. Consider removing the default so that a missing env var causes an immediate, obvious failure rather than a silent misconfiguration in production.

backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-555-562 (1)

555-562: ⚠️ Potential issue | 🟡 Minor

Potential KeyError if the query result row doesn't contain deleted_count.

Lines 561 and 654 use results[0]["deleted_count"] — a KeyError will propagate if the Cypher query structure changes or returns unexpected columns. Use .get() for defensive access, consistent with the pattern used elsewhere in this file.

-        deleted = results[0]["deleted_count"] if results else 0
+        deleted = results[0].get("deleted_count", 0) if results else 0

Same applies to delete_promise at Line 654.

backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-134-138 (1)

134-138: ⚠️ Potential issue | 🟡 Minor

Unused variable conv_entity_id.

The return value of _create_conversation_entity is captured but never referenced. If the side effect (creating the node) is all that matters, drop the assignment to silence the lint warning and clarify intent.

-            conv_entity_id = await self._create_conversation_entity(
+            await self._create_conversation_entity(
backends/advanced/src/advanced_omi_backend/services/knowledge_graph/service.py-835-841 (1)

835-841: ⚠️ Potential issue | 🟡 Minor

shutdown_knowledge_graph_service should acquire _service_lock to avoid a race with get_knowledge_graph_service.

Without the lock, a concurrent call to get_knowledge_graph_service can re-create the singleton between the None check and the shutdown assignment, leaking a freshly-created instance.

🔒 Proposed fix
 def shutdown_knowledge_graph_service() -> None:
     """Shutdown the global knowledge graph service."""
     global _knowledge_graph_service
-
-    if _knowledge_graph_service is not None:
-        _knowledge_graph_service.shutdown()
-        _knowledge_graph_service = None
+    with _service_lock:
+        if _knowledge_graph_service is not None:
+            _knowledge_graph_service.shutdown()
+            _knowledge_graph_service = None

Comment on lines +119 to +123
DELETE_ENTITY = """
MATCH (e:Entity {id: $id, user_id: $user_id})
DETACH DELETE e
RETURN count(e) as deleted_count
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Neo4j Cypher DETACH DELETE RETURN count behavior

💡 Result:

In Cypher, DETACH DELETE removes the matched nodes (and all their relationships), but it does not end the row stream. You can still RETURN count(*) because count(*) counts rows, not the deleted entities themselves. (neo4j.com)

What RETURN count(*) actually counts

  • MATCH ... WITH n ... DETACH DELETE n RETURN count(*) returns the number of rows that reached the RETURN.
  • If each row contains one node n, that is often “nodes deleted this batch”.
  • But if your MATCH pattern can produce duplicate rows for the same node (e.g., because it expands across relationships), then count(*) can overcount compared to the number of distinct nodes deleted.

Recommended pattern (count distinct nodes deleted)

MATCH (n:Label)
WHERE ...
WITH DISTINCT n
LIMIT 10000
DETACH DELETE n
RETURN count(*) AS nodesDeleted;

Why this is commonly used in batching loops

Neo4j’s own “large delete” guidance uses the pattern “delete a batch, RETURN count(*)”, and you repeat until it returns 0. (neo4j.com)

If you want “how many will be affected” before deleting

You generally run a separate counting query (e.g., count(n) and relationship counts) prior to DETACH DELETE. (neo4j.com)

Citations:


🏁 Script executed:

# View the specific query in question and other similar patterns
cat -n backends/advanced/src/advanced_omi_backend/services/knowledge_graph/queries.py | sed -n '119,123p; 238,239p; 317,318p'

Repository: SimpleOpenSoftware/chronicle

Length of output: 351


🏁 Script executed:

# Check if there are any other DELETE patterns in the file
rg -n "DETACH DELETE|DELETE.*RETURN" backends/advanced/src/advanced_omi_backend/services/knowledge_graph/queries.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 171


Fix DETACH DELETE count pattern—use count(*) instead of count(e).

After DETACH DELETE e, the variable e is no longer bound, so count(e) returns 0 regardless of whether nodes were deleted. This breaks deletion detection in the service. Use count(*) to count processed rows instead.

Affected queries:

  • Line 119–123: DELETE_ENTITY
  • Line 238–239: DELETE_PROMISE
  • Line 317–318: DELETE_USER_ENTITIES

Replace RETURN count(e) as deleted_count with RETURN count(*) as deleted_count (and similarly for p).

🤖 Prompt for AI Agents
In
`@backends/advanced/src/advanced_omi_backend/services/knowledge_graph/queries.py`
around lines 119 - 123, The DELETE queries use RETURN count(e) or count(p) after
DETACH DELETE which unbinds those variables; update the three query strings
DELETE_ENTITY, DELETE_PROMISE, and DELETE_USER_ENTITIES to return count(*)
instead (e.g., replace "RETURN count(e) as deleted_count" with "RETURN count(*)
as deleted_count" and similarly replace any "count(p)" with "count(*)") so
deletion counts reflect processed rows rather than an unbound variable.

Comment on lines 250 to 276
# Extract entities and relationships to knowledge graph (if enabled)
try:
from advanced_omi_backend.model_registry import get_config
config = get_config()
kg_enabled = config.get("memory", {}).get("knowledge_graph", {}).get("enabled", False)

if kg_enabled:
from advanced_omi_backend.services.knowledge_graph import get_knowledge_graph_service

kg_service = get_knowledge_graph_service()
kg_result = await kg_service.process_conversation(
conversation_id=conversation_id,
transcript=full_conversation,
user_id=user_id,
conversation_name=conversation_model.title if hasattr(conversation_model, 'title') else None,
)
if kg_result.get("entities", 0) > 0:
logger.info(
f"🔗 Knowledge graph: extracted {kg_result.get('entities', 0)} entities, "
f"{kg_result.get('relationships', 0)} relationships, "
f"{kg_result.get('promises', 0)} promises from {conversation_id}"
)
else:
logger.debug("Knowledge graph extraction disabled in config")
except Exception as e:
# Knowledge graph extraction is optional - don't fail the job
logger.warning(f"⚠️ Knowledge graph extraction failed (non-fatal): {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that get_config exists in model_registry module
echo "=== Searching for get_config function definitions ==="
rg -n 'def get_config\b' --type=py
echo ""
echo "=== Check model_registry for get_config (definitions or re-exports) ==="
fd -t f 'model_registry.py' --exec cat -n {} \;
echo ""
echo "=== Check imports of get_config from model_registry ==="
rg -n 'from advanced_omi_backend\.model_registry import.*get_config' --type=py
echo ""
echo "=== Check imports of get_config from config ==="
rg -n 'from advanced_omi_backend\.config import.*get_config' --type=py
echo ""
echo "=== Search for all get_config usages in memory_jobs.py ==="
rg -n 'get_config' backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py

Repository: SimpleOpenSoftware/chronicle

Length of output: 13647


Fix critical import error on line 252.

The import from advanced_omi_backend.model_registry import get_config will fail at runtime. The get_config function is defined in advanced_omi_backend.config and is only used internally by model_registry—it is not re-exported. Change line 252 to:

from advanced_omi_backend.config import get_config

The KG extraction logic itself is well-structured with config-gating and non-fatal error handling, but this import must be corrected before the code will function.

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 274-274: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@backends/advanced/src/advanced_omi_backend/workers/memory_jobs.py` around
lines 250 - 276, The import for get_config is incorrect and will raise at
runtime; replace the import in the try block that currently references
advanced_omi_backend.model_registry with the correct module
advanced_omi_backend.config so that get_config() resolves; update the import
statement near the KG extraction logic (where get_config is called, surrounding
variables include kg_enabled, kg_service, and the call
kg_service.process_conversation that uses conversation_id, full_conversation,
user_id and conversation_model.title) to import get_config from
advanced_omi_backend.config.

Comment on lines 315 to 321
for seg_data in processed.get("segments", []):
text = seg_data.get("text", "").strip()
start = seg_data.get("start_time", seg_data.get("start", 0.0))
end = seg_data.get("end_time", seg_data.get("end", 0.0))
speaker_raw = seg_data.get("speaker_id", seg_data.get("speaker"))
# Convert speaker to string (VibeVoice returns int)
speaker_id = f"Speaker {speaker_raw}" if speaker_raw is not None else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Double "Speaker " prefix when using the fallback parser.

_parse_vibevoice_output (line 291) already formats speaker as f"Speaker {seg.get('Speaker', 0)}" (e.g., "Speaker 0"). In _map_to_result, line 321 wraps the value again: f"Speaker {speaker_raw}", producing "Speaker Speaker 0".

This only affects the fallback path (when post_process_transcription fails), but it corrupts speaker labels.

🐛 Proposed fix
             speaker_raw = seg_data.get("speaker_id", seg_data.get("speaker"))
             # Convert speaker to string (VibeVoice returns int)
-            speaker_id = f"Speaker {speaker_raw}" if speaker_raw is not None else None
+            if speaker_raw is None:
+                speaker_id = None
+            elif isinstance(speaker_raw, int):
+                speaker_id = f"Speaker {speaker_raw}"
+            else:
+                speaker_id = str(speaker_raw)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for seg_data in processed.get("segments", []):
text = seg_data.get("text", "").strip()
start = seg_data.get("start_time", seg_data.get("start", 0.0))
end = seg_data.get("end_time", seg_data.get("end", 0.0))
speaker_raw = seg_data.get("speaker_id", seg_data.get("speaker"))
# Convert speaker to string (VibeVoice returns int)
speaker_id = f"Speaker {speaker_raw}" if speaker_raw is not None else None
for seg_data in processed.get("segments", []):
text = seg_data.get("text", "").strip()
start = seg_data.get("start_time", seg_data.get("start", 0.0))
end = seg_data.get("end_time", seg_data.get("end", 0.0))
speaker_raw = seg_data.get("speaker_id", seg_data.get("speaker"))
# Convert speaker to string (VibeVoice returns int)
if speaker_raw is None:
speaker_id = None
elif isinstance(speaker_raw, int):
speaker_id = f"Speaker {speaker_raw}"
else:
speaker_id = str(speaker_raw)
🤖 Prompt for AI Agents
In `@extras/asr-services/providers/vibevoice/transcriber.py` around lines 315 -
321, The fallback path in _map_to_result is double-prefixing speaker labels
because _parse_vibevoice_output already returns strings like "Speaker 0", but
_map_to_result blindly wraps speaker_raw with f"Speaker {speaker_raw}"; update
_map_to_result to detect and avoid double-prefixing by: when building
speaker_id, if speaker_raw is None keep None; if speaker_raw is an int or a
numeric string format as "Speaker {n}"; if speaker_raw is a string that already
starts with "Speaker " use it as-is; otherwise convert to string without adding
an extra "Speaker " prefix. Reference _parse_vibevoice_output and _map_to_result
to locate the related logic.

Comment on lines 159 to 174
ASR Capabilities Are From Known Set
[Documentation] Verify reported capabilities are valid known capabilities
... Known capabilities: timestamps, word_timestamps, segments, diarization
[Tags] infra

${info}= Get ASR Service Info ${ASR_URL}

# Define known capabilities
@{known_caps}= Create List timestamps word_timestamps segments diarization

# All reported capabilities should be known
FOR ${cap} IN @{info}[capabilities]
${is_known}= Evaluate $cap in $known_caps
Should Be True ${is_known}
... Unknown capability reported: ${cap}. Known: ${known_caps}
END
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for capabilities reported by providers
rg -n "capabilities" --type=py -C3 -g '*/providers/*'

Repository: SimpleOpenSoftware/chronicle

Length of output: 220


🏁 Script executed:

# Search for README and check documented capabilities
find . -name "README*" -o -name "readme*" | head -5

Repository: SimpleOpenSoftware/chronicle

Length of output: 166


🏁 Script executed:

# Check the ASR service info implementation to see what capabilities are available
rg -n "capabilities" --type=py | grep -E "(get|return|info)" -i | head -20

Repository: SimpleOpenSoftware/chronicle

Length of output: 2468


🏁 Script executed:

# Look for the ASR service info endpoint or method implementation
find . -name "*.py" -type f | xargs rg -l "Get ASR Service Info\|/info" | head -5

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Look at the provider implementations to see what capabilities they return
cat extras/asr-services/providers/transformers/service.py | head -150

Repository: SimpleOpenSoftware/chronicle

Length of output: 4120


🏁 Script executed:

# Check the faster_whisper provider
cat extras/asr-services/providers/faster_whisper/service.py | head -150

Repository: SimpleOpenSoftware/chronicle

Length of output: 4267


🏁 Script executed:

# Check the README.md for documented capabilities
rg -n "capabilities" README.md -C2

Repository: SimpleOpenSoftware/chronicle

Length of output: 54


🏁 Script executed:

# Check the test file to see what's being tested
cat tests/asr/protocol_tests.robot | head -180

Repository: SimpleOpenSoftware/chronicle

Length of output: 7391


Known capabilities set is incomplete and will cause test failures.

The test's known capabilities list [timestamps, word_timestamps, segments, diarization] does not match what the actual providers report:

  • Transformers provider reports: [timestamps, word_timestamps, language_detection]
  • Faster-Whisper provider reports: [timestamps, word_timestamps, language_detection, vad_filter, translation]

The test will fail because it checks if each reported capability exists in the known set. Both providers report language_detection, which is not in the test's known set. Faster-Whisper also reports vad_filter and translation, which are similarly missing.

Additionally, the test documentation incorrectly lists segments and diarization as known capabilities, but neither provider actually reports these.

Proposed fix — update known_caps to match actual provider outputs
-    @{known_caps}=    Create List    timestamps    word_timestamps    segments    diarization
+    @{known_caps}=    Create List    timestamps    word_timestamps    language_detection    vad_filter    translation
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ASR Capabilities Are From Known Set
[Documentation] Verify reported capabilities are valid known capabilities
... Known capabilities: timestamps, word_timestamps, segments, diarization
[Tags] infra
${info}= Get ASR Service Info ${ASR_URL}
# Define known capabilities
@{known_caps}= Create List timestamps word_timestamps segments diarization
# All reported capabilities should be known
FOR ${cap} IN @{info}[capabilities]
${is_known}= Evaluate $cap in $known_caps
Should Be True ${is_known}
... Unknown capability reported: ${cap}. Known: ${known_caps}
END
ASR Capabilities Are From Known Set
[Documentation] Verify reported capabilities are valid known capabilities
... Known capabilities: timestamps, word_timestamps, segments, diarization
[Tags] infra
${info}= Get ASR Service Info ${ASR_URL}
# Define known capabilities
@{known_caps}= Create List timestamps word_timestamps language_detection vad_filter translation
# All reported capabilities should be known
FOR ${cap} IN @{info}[capabilities]
${is_known}= Evaluate $cap in $known_caps
Should Be True ${is_known}
... Unknown capability reported: ${cap}. Known: ${known_caps}
END
🤖 Prompt for AI Agents
In `@tests/asr/protocol_tests.robot` around lines 159 - 174, The known capability
set in the "ASR Capabilities Are From Known Set" Robot test is out of date:
update the @{known_caps} list (used by the test that calls Get ASR Service Info
and iterates over ${info}[capabilities]) to match actual provider outputs by
including language_detection, vad_filter, and translation and removing segments
and diarization so the test validates reported capabilities correctly; ensure
the test documentation string is updated to reflect the new canonical set.

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 102
❌ Failed 17
📊 Total 119

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

* Enhance setup utilities and wizard functionality

- Introduced `detect_tailscale_info` function to automatically retrieve Tailscale DNS name and IP address, improving user experience for service configuration.
- Added `detect_cuda_version` function to identify the system's CUDA version, streamlining compatibility checks for GPU-based services.
- Updated `wizard.py` to utilize the new detection functions, enhancing service selection and configuration processes based on user input.
- Improved error handling and user feedback in service setup, ensuring clearer communication during configuration steps.
- Refactored existing code to improve maintainability and code reuse across setup utilities.

* Update ASR service capabilities and improve speaker identification handling

- Modified the capabilities of the VibeVoice ASR provider to include 'speaker_identification' and 'long_form', enhancing its feature set.
- Adjusted the speaker identification logic in the VibeVoiceTranscriber to prevent double-prefixing and ensure accurate speaker representation.
- Updated protocol tests to reflect the expanded list of known ASR capabilities, ensuring comprehensive validation of reported features.

* Refactor audio recording controls for improved UI and functionality

- Replaced MicOff icon with Square icon in MainRecordingControls and SimplifiedControls for a more intuitive user experience.
- Enhanced button interactions to streamline recording start/stop actions, including a pulsing effect during recording.
- Updated status messages and button states to provide clearer feedback on recording status and actions.
- Improved accessibility by ensuring buttons are disabled appropriately based on recording state and microphone access.

* chore:test docs and test improvements  (#288)

* Enhance test environment setup and configuration

- Added a new interactive setup script for configuring test API keys (Deepgram, OpenAI) to streamline the testing process.
- Introduced a template for the .env.test file to guide users in setting up their API keys.
- Updated the Makefile to include a new 'configure' target for setting up API keys.
- Enhanced the start-containers script to warn users if API keys are still set to placeholder values, improving user awareness during testing.
- Updated .gitignore to include the new .env.test.template file.

* Remove outdated documentation and restructure feature overview

- Deleted the `features.md` file, consolidating its content into the new `overview.md` for a more streamlined documentation structure.
- Updated `init-system.md` to link to the new `overview.md` instead of the removed `features.md`.
- Removed `ports-and-access.md` as its content was integrated into other documentation files, enhancing clarity and reducing redundancy.
- Revised the `README.md` in the advanced backend to reflect the new naming conventions and updated links to documentation.
- Introduced a new `plugin-development-guide.md` to assist users in creating custom plugins, expanding the documentation for developers.

* tech debt
@AnkushMalaker AnkushMalaker merged commit 8c31635 into dev Feb 7, 2026
2 of 3 checks passed
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

⚠️ Robot Framework Test Results (No API Keys)

Status: ❌ Some tests failed

ℹ️ Note: This run excludes tests requiring external API keys (Deepgram, OpenAI).
Tests tagged with requires-api-keys will run on dev/main branches.

Metric Count
✅ Passed 102
❌ Failed 17
📊 Total 119

📊 View Reports

GitHub Pages (Live Reports):

Download Artifacts:


View full workflow run

@AnkushMalaker AnkushMalaker deleted the feat/vibevoice-asr branch February 7, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant